`

Load the tweets and check if they are loaded correctly. We also check the summary for a first interpretation. The summary(tweets) output reveals the following:

# Set working directory
# getwd()
# setwd("./data/")

# Load data
load("../data/Tweets_all.rda")

# Check that tweets are loaded
head(tweets)
## # A tibble: 6 × 14
##   created_at               id id_str            full_text in_reply_to_screen_n…¹
##   <dttm>                <dbl> <chr>             <chr>     <chr>                 
## 1 2023-01-20 17:17:32 1.62e18 1616469988369469… "Im MSc … <NA>                  
## 2 2023-01-13 07:52:01 1.61e18 1613790954737074… "Was bew… <NA>                  
## 3 2023-01-12 19:30:01 1.61e18 1613604227141537… "Was uns… <NA>                  
## 4 2023-01-12 08:23:00 1.61e18 1613436367169634… "Eine di… <NA>                  
## 5 2023-01-11 14:00:05 1.61e18 1613158809081450… "Wir gra… <NA>                  
## 6 2023-01-10 17:06:11 1.61e18 1612843252083834… "Unsere … <NA>                  
## # ℹ abbreviated name: ¹​in_reply_to_screen_name
## # ℹ 9 more variables: retweet_count <int>, favorite_count <int>, lang <chr>,
## #   university <chr>, tweet_date <dttm>, tweet_minute <dttm>,
## #   tweet_hour <dttm>, tweet_month <date>, timeofday_hour <chr>
summary(tweets)
##    created_at                          id               id_str         
##  Min.   :2009-09-29 14:29:47.0   Min.   :4.469e+09   Length:19575      
##  1st Qu.:2015-01-28 15:07:41.5   1st Qu.:5.604e+17   Class :character  
##  Median :2018-04-13 13:26:56.0   Median :9.848e+17   Mode  :character  
##  Mean   :2017-12-09 15:26:50.7   Mean   :9.400e+17                     
##  3rd Qu.:2020-10-20 10:34:50.0   3rd Qu.:1.318e+18                     
##  Max.   :2023-01-26 14:49:31.0   Max.   :1.619e+18                     
##   full_text         in_reply_to_screen_name retweet_count     favorite_count  
##  Length:19575       Length:19575            Min.   :  0.000   Min.   :  0.00  
##  Class :character   Class :character        1st Qu.:  0.000   1st Qu.:  0.00  
##  Mode  :character   Mode  :character        Median :  1.000   Median :  0.00  
##                                             Mean   :  1.289   Mean   :  1.37  
##                                             3rd Qu.:  2.000   3rd Qu.:  2.00  
##                                             Max.   :267.000   Max.   :188.00  
##      lang            university          tweet_date                    
##  Length:19575       Length:19575       Min.   :2009-09-29 00:00:00.00  
##  Class :character   Class :character   1st Qu.:2015-01-28 00:00:00.00  
##  Mode  :character   Mode  :character   Median :2018-04-13 00:00:00.00  
##                                        Mean   :2017-12-09 02:25:45.00  
##                                        3rd Qu.:2020-10-20 00:00:00.00  
##                                        Max.   :2023-01-26 00:00:00.00  
##   tweet_minute                      tweet_hour                    
##  Min.   :2009-09-29 14:29:00.00   Min.   :2009-09-29 14:00:00.00  
##  1st Qu.:2015-01-28 15:07:00.00   1st Qu.:2015-01-28 14:30:00.00  
##  Median :2018-04-13 13:26:00.00   Median :2018-04-13 13:00:00.00  
##  Mean   :2017-12-09 15:26:24.68   Mean   :2017-12-09 14:59:43.81  
##  3rd Qu.:2020-10-20 10:34:30.00   3rd Qu.:2020-10-20 10:00:00.00  
##  Max.   :2023-01-26 14:49:00.00   Max.   :2023-01-26 14:00:00.00  
##   tweet_month         timeofday_hour    
##  Min.   :2009-09-01   Length:19575      
##  1st Qu.:2015-01-01   Class :character  
##  Median :2018-04-01   Mode  :character  
##  Mean   :2017-11-24                     
##  3rd Qu.:2020-10-01                     
##  Max.   :2023-01-01

Start preprocessing the tweets, to calculate the intervalls some additional properties are needed. The preprocessing steps transform raw tweet data into a structured format suitable for analysis. This includes:

# Preprocessing Step: Convert date and time to POSIXct and format according to date, year and university. Detect language and extract emojis. The days are sorted from the system locale starting from monday
tweets <- tweets %>%
  mutate(
    created_at = as.POSIXct(created_at, format = "%Y-%m-%d %H:%M:%S"),
    date = as.Date(created_at),
    day = lubridate::wday(created_at,
      label = TRUE, abbr = FALSE,
      week_start = getOption("lubridate.week.start", 1),
      locale = Sys.getlocale("LC_TIME")
    ),
    year = year(created_at),
    month = floor_date(created_at, "month"),
    university = as.character(university),
    lang = detect_language(full_text),
    full_text_emojis = replace_emoji(full_text, emoji_dt = lexicon::hash_emojis)
  )

# Remove Emoji Tags helper funciton
# replace emoji places the emojis in the text as tags and their name, we remove them here
remove_emoji_tags <- function(text) {
  str_remove_all(text, "<[a-z0-9]{2}>")
}
# Remove Emoji Tags
tweets$full_text_emojis <- sapply(tweets$full_text_emojis, remove_emoji_tags)

# Store emojis in a sep arate column to analyze later
tweets$emoji_unicode <- tweets %>%
  emoji_extract_nest(full_text) %>%
  select(.emoji_unicode)

Question 1: How many tweets are being posted by the various Universities when? Are there any ‘release’ strategies visible?

# Count each tweet by university and hour of the day
tweet_counts_by_hour_of_day <- tweets %>%
  group_by(university, timeofday_hour) %>%
  count() %>%
  arrange(university, timeofday_hour)

# Plot the number of tweets by university and hour of the day
ggplot(
  tweet_counts_by_hour_of_day,
  aes(
    x = timeofday_hour, y = n,
    color = university, group = university
  )
) +
  geom_line() +
  facet_wrap(~university) +
  labs(
    title = "Number of tweets by university and hour",
    x = "Hour of day",
    y = "Number of tweets"
  )

# Show most active hours for each university
hours_with_most_tweets_by_uni <- tweet_counts_by_hour_of_day %>%
  group_by(university, timeofday_hour) %>%
  summarize(total_tweets = sum(n)) %>%
  group_by(university) %>%
  slice_max(n = 1, order_by = total_tweets)

print(hours_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups:   university [8]
##   university     timeofday_hour total_tweets
##   <chr>          <chr>                 <int>
## 1 FHNW           09                      344
## 2 FH_Graubuenden 11                      493
## 3 ZHAW           17                      580
## 4 bfh            08                      497
## 5 hes_so         10                      315
## 6 hslu           09                      380
## 7 ost_fh         08                       44
## 8 supsi_ch       11                      330
# Show most active hour overall
hour_with_most_tweets <- tweet_counts_by_hour_of_day %>%
  group_by(timeofday_hour) %>%
  summarize(total_tweets = sum(n)) %>%
  arrange(desc(total_tweets)) %>%
  slice_max(n = 1, order_by = total_tweets)

print(hour_with_most_tweets)
## # A tibble: 1 × 2
##   timeofday_hour total_tweets
##   <chr>                 <int>
## 1 11                     2356
# Count each tweet by university and weekday
tweet_counts_by_week_day <- tweets %>%
  group_by(university, day) %>%
  count() %>%
  arrange(university, day)

# Plot the number of tweets by university and day of the week
ggplot(
  tweet_counts_by_week_day,
  aes(
    x = day, y = n,
    color = university,
    group = university
  )
) +
  geom_line() +
  facet_wrap(~university) +
  labs(
    title = "Number of tweets by university and day of the week",
    x = "Day of the week",
    y = "Number of tweets"
  )

# Show most active days for each university
days_with_most_tweets_by_uni <- tweet_counts_by_week_day %>%
  group_by(university, day) %>%
  summarize(total_tweets = sum(n)) %>%
  group_by(university) %>%
  slice_max(n = 1, order_by = total_tweets)

print(days_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups:   university [8]
##   university     day       total_tweets
##   <chr>          <ord>            <int>
## 1 FHNW           Tuesday            575
## 2 FH_Graubuenden Tuesday            751
## 3 ZHAW           Wednesday          636
## 4 bfh            Tuesday            651
## 5 hes_so         Tuesday            415
## 6 hslu           Thursday           603
## 7 ost_fh         Friday              65
## 8 supsi_ch       Friday             461
# Calculate time intervals between tweets
find_mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

tweets <- tweets %>%
  arrange(university, created_at) %>%
  group_by(university) %>%
  mutate(time_interval = as.numeric(
    difftime(created_at, lag(created_at), units = "mins")
  ))

# Descriptive statistics of time intervals
summary(tweets$time_interval)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0    148.2   1128.8   2097.6   2428.3 220707.0        8
# setwd("../4.Text-Mining-Groupwork/plots")
unique_years <- tweets$year %>% unique()
# Pilot distribution of time intervals between tweets for each year
for (curr_year in unique_years) {
  # Filter data for the specific year
  filtered_data <- tweets %>%
    filter(year(created_at) == curr_year)

  print(ggplot(filtered_data, aes(x = time_interval)) +
    geom_histogram(fill = "lightblue") +
    facet_wrap(~university) +
    labs(
      title = paste0(
        "Distribution of time intervals between tweets - ", curr_year
      ),
      x = "Time interval (minutes)",
      y = "Tweet count"
    ))
  universities <- filtered_data$university %>% unique()
  for (uni in universities) {
    # Filter data for the specific university
    uni_filtered_data <- filtered_data %>%
      filter(university == uni)

    print(ggplot(uni_filtered_data, aes(x = time_interval)) +
      geom_histogram(fill = "lightblue") +
      labs(
        title = paste0(
          "Distribution of time intervals between tweets for ", uni,
          " in ", curr_year
        ),
        x = "Time interval (minutes)",
        y = "Tweet count"
      ))
    # Calculate mode (most common interval) in hours
    most_common_interval_minutes <- find_mode(uni_filtered_data$time_interval)
    most_common_interval_hours <- most_common_interval_minutes / 60
    print(paste0(
      "Most common time interval for ", uni,
      " in ",
      curr_year,
      " is ", most_common_interval_minutes,
      " minutes (", most_common_interval_hours, " hours)"
    ))
  }
}

## [1] "Most common time interval for FHNW in 2011 is NA minutes (NA hours)"

## [1] "Most common time interval for FH_Graubuenden in 2011 is 23210.3 minutes (386.838333333333 hours)"

## [1] "Most common time interval for hes_so in 2011 is 1.55 minutes (0.0258333333333333 hours)"

## [1] "Most common time interval for FHNW in 2012 is 17324.65 minutes (288.744166666667 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2012 is 0.9 minutes (0.015 hours)"

## [1] "Most common time interval for ZHAW in 2012 is NA minutes (NA hours)"

## [1] "Most common time interval for bfh in 2012 is NA minutes (NA hours)"

## [1] "Most common time interval for hes_so in 2012 is 22086.35 minutes (368.105833333333 hours)"

## [1] "Most common time interval for FHNW in 2013 is 1.26666666666667 minutes (0.0211111111111111 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2013 is 21879.45 minutes (364.6575 hours)"

## [1] "Most common time interval for ZHAW in 2013 is 0.583333333333333 minutes (0.00972222222222222 hours)"

## [1] "Most common time interval for bfh in 2013 is 65.0833333333333 minutes (1.08472222222222 hours)"

## [1] "Most common time interval for hes_so in 2013 is 36252.5833333333 minutes (604.209722222222 hours)"

## [1] "Most common time interval for supsi_ch in 2013 is 0.783333333333333 minutes (0.0130555555555556 hours)"

## [1] "Most common time interval for FHNW in 2014 is 4.58333333333333 minutes (0.0763888888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2014 is 0.183333333333333 minutes (0.00305555555555556 hours)"

## [1] "Most common time interval for ZHAW in 2014 is 0.05 minutes (0.000833333333333333 hours)"

## [1] "Most common time interval for bfh in 2014 is 153.35 minutes (2.55583333333333 hours)"

## [1] "Most common time interval for hes_so in 2014 is 21986.6 minutes (366.443333333333 hours)"

## [1] "Most common time interval for supsi_ch in 2014 is 37496.4833333333 minutes (624.941388888889 hours)"

## [1] "Most common time interval for FHNW in 2015 is 48918.3 minutes (815.305 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2015 is 1139.9 minutes (18.9983333333333 hours)"

## [1] "Most common time interval for ZHAW in 2015 is 0.316666666666667 minutes (0.00527777777777778 hours)"

## [1] "Most common time interval for bfh in 2015 is 20272.0333333333 minutes (337.867222222222 hours)"

## [1] "Most common time interval for hes_so in 2015 is 0.166666666666667 minutes (0.00277777777777778 hours)"

## [1] "Most common time interval for supsi_ch in 2015 is 43496.6333333333 minutes (724.943888888889 hours)"

## [1] "Most common time interval for FHNW in 2016 is 34708.6666666667 minutes (578.477777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2016 is 240.05 minutes (4.00083333333333 hours)"

## [1] "Most common time interval for ZHAW in 2016 is 21.2 minutes (0.353333333333333 hours)"

## [1] "Most common time interval for bfh in 2016 is 0.0833333333333333 minutes (0.00138888888888889 hours)"

## [1] "Most common time interval for hes_so in 2016 is 2.7 minutes (0.045 hours)"

## [1] "Most common time interval for hslu in 2016 is NA minutes (NA hours)"

## [1] "Most common time interval for supsi_ch in 2016 is 1.58333333333333 minutes (0.0263888888888889 hours)"

## [1] "Most common time interval for FHNW in 2017 is 48748.5333333333 minutes (812.475555555556 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2017 is 5617.83333333333 minutes (93.6305555555555 hours)"

## [1] "Most common time interval for ZHAW in 2017 is 6954.43333333333 minutes (115.907222222222 hours)"

## [1] "Most common time interval for bfh in 2017 is 18606.6666666667 minutes (310.111111111111 hours)"

## [1] "Most common time interval for hes_so in 2017 is 71909.9833333333 minutes (1198.49972222222 hours)"

## [1] "Most common time interval for hslu in 2017 is 0.266666666666667 minutes (0.00444444444444444 hours)"

## [1] "Most common time interval for supsi_ch in 2017 is 1.36666666666667 minutes (0.0227777777777778 hours)"

## [1] "Most common time interval for FHNW in 2018 is 0.166666666666667 minutes (0.00277777777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2018 is 1446.23333333333 minutes (24.1038888888889 hours)"

## [1] "Most common time interval for ZHAW in 2018 is 5689.93333333333 minutes (94.8322222222222 hours)"

## [1] "Most common time interval for bfh in 2018 is 20172.05 minutes (336.200833333333 hours)"

## [1] "Most common time interval for hes_so in 2018 is 31170.8333333333 minutes (519.513888888889 hours)"

## [1] "Most common time interval for hslu in 2018 is 0.233333333333333 minutes (0.00388888888888889 hours)"

## [1] "Most common time interval for supsi_ch in 2018 is 0.183333333333333 minutes (0.00305555555555556 hours)"

## [1] "Most common time interval for FHNW in 2019 is 315.233333333333 minutes (5.25388888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2019 is 10079.85 minutes (167.9975 hours)"

## [1] "Most common time interval for ZHAW in 2019 is 1255.61666666667 minutes (20.9269444444444 hours)"

## [1] "Most common time interval for bfh in 2019 is 1440.05 minutes (24.0008333333333 hours)"

## [1] "Most common time interval for hes_so in 2019 is 1140.03333333333 minutes (19.0005555555556 hours)"

## [1] "Most common time interval for hslu in 2019 is 1.95 minutes (0.0325 hours)"

## [1] "Most common time interval for supsi_ch in 2019 is 15 minutes (0.25 hours)"

## [1] "Most common time interval for FHNW in 2020 is 3180.16666666667 minutes (53.0027777777778 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2020 is 2880.03333333333 minutes (48.0005555555556 hours)"

## [1] "Most common time interval for ZHAW in 2020 is 13693.7666666667 minutes (228.229444444444 hours)"

## [1] "Most common time interval for bfh in 2020 is 14531.7333333333 minutes (242.195555555556 hours)"

## [1] "Most common time interval for hes_so in 2020 is 1139.91666666667 minutes (18.9986111111111 hours)"

## [1] "Most common time interval for hslu in 2020 is 120 minutes (2 hours)"

## [1] "Most common time interval for ost_fh in 2020 is NA minutes (NA hours)"

## [1] "Most common time interval for supsi_ch in 2020 is 0.133333333333333 minutes (0.00222222222222222 hours)"

## [1] "Most common time interval for FHNW in 2021 is 0.5 minutes (0.00833333333333333 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2021 is 0.333333333333333 minutes (0.00555555555555555 hours)"

## [1] "Most common time interval for ZHAW in 2021 is 13043.9833333333 minutes (217.399722222222 hours)"

## [1] "Most common time interval for bfh in 2021 is 1411.05 minutes (23.5175 hours)"

## [1] "Most common time interval for hes_so in 2021 is 0 minutes (0 hours)"

## [1] "Most common time interval for hslu in 2021 is 0 minutes (0 hours)"

## [1] "Most common time interval for ost_fh in 2021 is 0.35 minutes (0.00583333333333333 hours)"

## [1] "Most common time interval for supsi_ch in 2021 is 1140 minutes (19 hours)"

## [1] "Most common time interval for FHNW in 2022 is 1439.93333333333 minutes (23.9988888888889 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2022 is 0.1 minutes (0.00166666666666667 hours)"

## [1] "Most common time interval for ZHAW in 2022 is 18623.7166666667 minutes (310.395277777778 hours)"

## [1] "Most common time interval for bfh in 2022 is 7192.96666666667 minutes (119.882777777778 hours)"

## [1] "Most common time interval for hes_so in 2022 is 5798.53333333333 minutes (96.6422222222222 hours)"

## [1] "Most common time interval for hslu in 2022 is 0 minutes (0 hours)"

## [1] "Most common time interval for ost_fh in 2022 is 0.133333333333333 minutes (0.00222222222222222 hours)"

## [1] "Most common time interval for supsi_ch in 2022 is 28800.7333333333 minutes (480.012222222222 hours)"

## [1] "Most common time interval for FHNW in 2023 is 9997.63333333333 minutes (166.627222222222 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2023 is 21962.3833333333 minutes (366.039722222222 hours)"

## [1] "Most common time interval for ZHAW in 2023 is 70740.3333333333 minutes (1179.00555555556 hours)"

## [1] "Most common time interval for bfh in 2023 is 8000.11666666667 minutes (133.335277777778 hours)"

## [1] "Most common time interval for hes_so in 2023 is 4621.1 minutes (77.0183333333333 hours)"

## [1] "Most common time interval for hslu in 2023 is 627.083333333333 minutes (10.4513888888889 hours)"

## [1] "Most common time interval for supsi_ch in 2023 is 7199 minutes (119.983333333333 hours)"

## [1] "Most common time interval for FH_Graubuenden in 2009 is NA minutes (NA hours)"

## [1] "Most common time interval for FH_Graubuenden in 2010 is 55732.2833333333 minutes (928.871388888889 hours)"

## [1] "Most common time interval for hes_so in 2010 is NA minutes (NA hours)"

Question 2: What are the tweets about and how do other Twitter users react to them (likes, etc.)?

Data Preprocessing

langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
  filter(lang %in% langs)
# Define extended stopwords (outside loop for efficiency)
# Remove 'amp' as it is not meaningful because its only & symbol
# Remove 'rt' because it is an word e.g 'engagiert'.
extended_stopwords <- c(
  "#fhnw", "#bfh", "@htw_chur", "#hslu", "#supsi", "#sups",
  "amp", "rt", "fr", "ber", "t.co", "https", "http", "www", "com", "html"
)
# Create separate DFMs for each language
dfm_list <- list()
for (sel_lang in langs) {
  # Subset tweets for the current language
  tweets_lang <- tweets_filtered %>%
    filter(lang == sel_lang)
  # Create tokens for the current language
  stopwords_lang <- stopwords(sel_lang)
  # Create tokens for all tweets:
  # - create corpus and tokens because tokensonly works on character, corpus, list, tokens, tokens_xptr objects.
  # - create tokens and remove: URLS, Punctuation, Numbers, Symbols, Separators
  # - transform to lowercase
  # - Stem all words
  # - Create n-grams of any length (not includinf bigrams and trigrams but they are shown later)
  # - It is important to remove the stopwords after stemming the words because we remove the endings from some stem words
  tokens_lang <- tweets_lang %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem(lang = sel_lang) %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(stopwords_lang, extended_stopwords), selection = "remove"
    )
  # Create DFM for the current language
  dfm_list[[sel_lang]] <- dfm(tokens_lang)
}

Content Analysis

  • Hslu has much english tweets
  • BFH and FHNW has much german tweets
  • Supsi have much italian tweets, the most common words are in italian
  • hes-so have much french tweets, the most common words are in french
# Word Frequencies & Visualization
words_freqs_en <- sort(colSums(dfm_list$en), decreasing = TRUE)
head(words_freqs_en, 20)
##     student         new       @hslu     univers     project       thank 
##         106          74          70          62          60          60 
##       @zhaw         day      scienc       today       innov         now 
##          59          56          54          52          51          50 
##       swiss switzerland       @fhnw       great          us        join 
##          49          49          46          46          44          43 
##       studi    research 
##          42          42
wordcloud2(data.frame(
  word = names(words_freqs_en),
  freq = words_freqs_en
), size = 0.5)
words_freqs_de <- sort(colSums(dfm_list$de), decreasing = TRUE)
head(words_freqs_de, 20)
##       neu      mehr   schweiz      werd       all   studier      heut hochschul 
##      1586      1104       967       772       706       706       638       601 
##       bfh      jahr     knnen   digital     thema     studi   projekt     welch 
##       577       535       507       499       497       466       465       462 
##      bern     statt     zeigt    arbeit 
##       454       451       437       434
wordcloud2(data.frame(
  word = names(words_freqs_de),
  freq = words_freqs_de
), size = 0.5)
word_freqs_it <- sort(colSums(dfm_list$it), decreasing = TRUE)
head(word_freqs_it, 20)
##        nuov        sups     progett     student     present        info 
##         210         208         173         146         143         143 
##   iscrizion        cors      ricerc   formazion  #supsinews #supsievent 
##         142         141         135         134         134         129 
##       scopr      inform      diplom    bachelor       apert        tutt 
##         123         120         116         111         110         105 
##      master          pi 
##         103         102
wordcloud2(data.frame(
  word = names(word_freqs_it),
  freq = word_freqs_it
), size = 0.5)
# It seems that there are some english words but I think this are emojis
word_freqs_fr <- sort(colSums(dfm_list$fr), decreasing = TRUE)
head(word_freqs_fr, 20)
##    hes-so     right     arrow       dan    projet         a      tudi      haut 
##       505       432       324       249       248       234       199       183 
##       col   @hes_so @hessoval    dcouvr      book      open  recherch   #hes_so 
##       155       140       129       127       123       118       117       115 
##     suiss      plus      mast   nouveau 
##       110       105       103        98
wordcloud2(data.frame(
  word = names(word_freqs_fr),
  freq = word_freqs_fr
), size = 0.5)
# University-specific Analysis
for (uni in unique(tweets$university)) {
  # Subset tweets for the current language
  uni_tweets <- tweets_filtered %>%
    filter(university == uni)

  tokens_lang <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  # Create Data Frame Matrix for uni with all languages
  uni_dfm <- dfm(tokens_lang)
  # Word Frequencies
  uni_word_freqs <- sort(colSums(uni_dfm), decreasing = TRUE)
  # print most common words: the emoji right are used often
  head(uni_word_freqs, 20)
  wordcloud2(data.frame(
    word = names(uni_word_freqs),
    freq = uni_word_freqs
  ), size = 0.5)
}

Userreaction Analysis

# Calculate a 'weighted engagement' metric
tweets <- tweets %>%
  mutate(
    weighted_engagement = favorite_count * 1 + retweet_count * 2
  )

# Identify tweets with the highest weighted engagement
most_engaged_tweets <- tweets %>%
  arrange(desc(weighted_engagement)) %>%
  head(1000) # Top 1000 for analysis

# Analyze posting time of most engaged tweets (same as before)
most_engaged_tweets_time <- most_engaged_tweets %>%
  mutate(time_of_day = format(created_at, "%H"))

ggplot(most_engaged_tweets_time, aes(x = as.numeric(time_of_day))) +
  geom_histogram(binwidth = 1, fill = "lightblue", color = "blue") +
  labs(
    title = "Distribution of Posting Times for Most Engaged Tweets",
    x = "Hour of Day",
    y = "Frequency"
  )

Analyse the content of the most liked tweets

# Preprocessing content of most liked tweets
tokens_most_engaged <- most_engaged_tweets %>%
  corpus(text_field = "full_text_emojis") %>%
  tokens(
    remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
    remove_url = TRUE, remove_separators = TRUE
  ) %>%
  tokens_tolower() %>%
  tokens_wordstem(lang = sel_lang) %>%
  tokens_ngrams(n = 1) %>%
  tokens_select(
    pattern =
      c(
        stopwords("en"), stopwords("de"),
        stopwords("fr"), stopwords("it"), extended_stopwords
      ), selection = "remove"
  )
tokens_most_engaged_dfm <- dfm(tokens_most_engaged)
freqs_most_engaged <- sort(colSums(tokens_most_engaged_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
head(freqs_most_engaged, 20)
##            mehr            neue         schweiz       schweizer           right 
##              81              67              48              47              46 
##            heut           zeigt #hsluinformatik           studi            zhaw 
##              44              41              40              39              39 
##          hes-so           knnen           neuen       hochschul          campus 
##              38              38              36              34              33 
##           innov            gibt              ab      entwickelt             bfh 
##              31              30              30              30              30
set.seed(123)
wordcloud2(data.frame(
  word = names(freqs_most_engaged),
  freq = freqs_most_engaged
), size = 0.5)

Question 3: How do the university tweets differ in terms of content, style, emotions, etc?

Content Analysis (Word Clouds)

for (uni in unique(tweets$university)) {
  uni_tweets <- tweets %>%
    filter(university == uni, lang %in% langs)
  tokens_uni <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  uni_dfm <- dfm(tokens_uni)
  freqs_uni <- sort(colSums(uni_dfm), decreasing = TRUE)
  # print most common words: the emoji right are used often
  head(freqs_uni, 20)
  set.seed(123)
  wordcloud2(data.frame(
    word = names(freqs_uni),
    freq = freqs_uni
  ), size = 0.5)

  # Analyze Top Emojis by University
  emoji_count_per_university <- uni_tweets %>%
    top_n_emojis(full_text)

  print(emoji_count_per_university)

  emoji_count_per_university %>%
    mutate(emoji_name = reorder(emoji_name, n)) %>%
    ggplot(aes(n, emoji_name)) +
    geom_col() +
    labs(x = "Count", y = NULL, title = "Top 20 Emojis Used")
}
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        56
##  2 yellow_heart                  💛      Smileys & Emotion    34
##  3 black_heart                   🖤      Smileys & Emotion    32
##  4 woman                         👩      People & Body        28
##  5 man                           👨      People & Body        17
##  6 clap                          👏      People & Body        16
##  7 flag_Switzerland              🇨🇭      Flags                15
##  8 microscope                    🔬      Objects              15
##  9 computer                      💻      Objects              14
## 10 graduation_cap                🎓      Objects              13
## 11 school                        🏫      Travel & Places      13
## 12 face_with_medical_mask        😷      Smileys & Emotion    12
## 13 raised_hands                  🙌      People & Body        12
## 14 robot                         🤖      Smileys & Emotion    12
## 15 female_sign                   ♀️       Symbols              10
## 16 trophy                        🏆      Activities            9
## 17 woman_scientist               👩‍🔬      People & Body         9
## 18 party_popper                  🎉      Activities            8
## 19 star_struck                   🤩      Smileys & Emotion     8
## 20 sun_with_face                 🌞      Travel & Places       8
## # A tibble: 20 × 4
##    emoji_name                      unicode emoji_category        n
##    <chr>                           <chr>   <chr>             <int>
##  1 party_popper                    🎉      Activities           18
##  2 grinning_face_with_big_eyes     😃      Smileys & Emotion    15
##  3 blush                           😊      Smileys & Emotion     8
##  4 smiling_face_with_sunglasses    😎      Smileys & Emotion     8
##  5 bulb                            💡      Objects               7
##  6 +1                              👍      People & Body         6
##  7 camera_flash                    📸      Objects               6
##  8 flexed_biceps                   💪      People & Body         6
##  9 four_leaf_clover                🍀      Animals & Nature      6
## 10 grinning_face_with_smiling_eyes 😄      Smileys & Emotion     6
## 11 heart_eyes                      😍      Smileys & Emotion     6
## 12 hugs                            🤗      Smileys & Emotion     6
## 13 female_sign                     ♀️       Symbols               4
## 14 graduation_cap                  🎓      Objects               4
## 15 grinning                        😀      Smileys & Emotion     4
## 16 robot                           🤖      Smileys & Emotion     4
## 17 backhand_index_pointing_down    👇      People & Body         3
## 18 computer                        💻      Objects               3
## 19 lady_beetle                     🐞      Animals & Nature      3
## 20 ocean                           🌊      Travel & Places       3
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        21
##  2 high_voltage                  ⚡      Travel & Places      11
##  3 wink                          😉      Smileys & Emotion     9
##  4 clap                          👏      People & Body         5
##  5 flag_Switzerland              🇨🇭      Flags                 5
##  6 rocket                        🚀      Travel & Places       5
##  7 +1                            👍      People & Body         4
##  8 arrow_right                   ➡️       Symbols               4
##  9 bug                           🐛      Animals & Nature      3
## 10 computer                      💻      Objects               3
## 11 flexed_biceps                 💪      People & Body         3
## 12 man                           👨      People & Body         3
## 13 bangbang                      ‼️       Symbols               2
## 14 dark_skin_tone                🏿      Component             2
## 15 exclamation                   ❗      Symbols               2
## 16 female_sign                   ♀️       Symbols               2
## 17 four_leaf_clover              🍀      Animals & Nature      2
## 18 green_salad                   🥗      Food & Drink          2
## 19 grinning                      😀      Smileys & Emotion     2
## 20 medium_light_skin_tone        🏼      Component             2
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 backhand_index_pointing_right 👉      People & Body        49
##  2 battery                       🔋      Objects              16
##  3 backhand_index_pointing_down  👇      People & Body        12
##  4 woman                         👩      People & Body        12
##  5 palm_tree                     🌴      Animals & Nature     11
##  6 bulb                          💡      Objects              10
##  7 computer                      💻      Objects              10
##  8 evergreen_tree                🌲      Animals & Nature     10
##  9 graduation_cap                🎓      Objects              10
## 10 party_popper                  🎉      Activities           10
## 11 robot                         🤖      Smileys & Emotion    10
## 12 clap                          👏      People & Body         9
## 13 coconut                       🥥      Food & Drink          9
## 14 date                          📅      Objects               9
## 15 deciduous_tree                🌳      Animals & Nature      9
## 16 flag_Switzerland              🇨🇭      Flags                 9
## 17 rocket                        🚀      Travel & Places       9
## 18 automobile                    🚗      Travel & Places       8
## 19 clinking_glasses              🥂      Food & Drink          8
## 20 seedling                      🌱      Animals & Nature      8
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category      n
##    <chr>                         <chr>   <chr>           <int>
##  1 arrow_right                   ➡️       Symbols           320
##  2 arrow_heading_down            ⤵️       Symbols           245
##  3 book                          📖      Objects           115
##  4 mag_right                     🔎      Objects            97
##  5 mega                          📣      Objects            53
##  6 clapper                       🎬      Objects            38
##  7 NEW_button                    🆕      Symbols            35
##  8 computer                      💻      Objects            35
##  9 microscope                    🔬      Objects            32
## 10 bulb                          💡      Objects            29
## 11 police_car_light              🚨      Travel & Places    27
## 12 backhand_index_pointing_right 👉      People & Body      26
## 13 graduation_cap                🎓      Objects            23
## 14 studio_microphone             🎙️       Objects            23
## 15 clap                          👏      People & Body      21
## 16 date                          📅      Objects            17
## 17 medal_sports                  🏅      Activities         15
## 18 memo                          📝      Objects            15
## 19 woman                         👩      People & Body      15
## 20 flag_Switzerland              🇨🇭      Flags              14
## # A tibble: 20 × 4
##    emoji_name                   unicode emoji_category        n
##    <chr>                        <chr>   <chr>             <int>
##  1 sparkles                     ✨      Activities           28
##  2 flag_Switzerland             🇨🇭      Flags                18
##  3 rocket                       🚀      Travel & Places      12
##  4 party_popper                 🎉      Activities           11
##  5 partying_face                🥳      Smileys & Emotion     9
##  6 Christmas_tree               🎄      Activities            7
##  7 clap                         👏      People & Body         7
##  8 star                         ⭐      Travel & Places       7
##  9 bottle_with_popping_cork     🍾      Food & Drink          6
## 10 bulb                         💡      Objects               5
## 11 glowing_star                 🌟      Travel & Places       5
## 12 smiling_face_with_sunglasses 😎      Smileys & Emotion     5
## 13 +1                           👍      People & Body         4
## 14 camera_flash                 📸      Objects               4
## 15 clinking_glasses             🥂      Food & Drink          4
## 16 four_leaf_clover             🍀      Animals & Nature      4
## 17 musical_notes                🎶      Objects               4
## 18 person_running               🏃      People & Body         4
## 19 raised_hands                 🙌      People & Body         4
## 20 robot                        🤖      Smileys & Emotion     4
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 graduation_cap                🎓      Objects               3
##  2 man                           👨      People & Body         2
##  3 man_student                   👨‍🎓      People & Body         2
##  4 rocket                        🚀      Travel & Places       2
##  5 snowflake                     ❄️       Travel & Places       2
##  6 backhand_index_pointing_right 👉      People & Body         1
##  7 brain                         🧠      People & Body         1
##  8 chocolate_bar                 🍫      Food & Drink          1
##  9 clapper                       🎬      Objects               1
## 10 eyes                          👀      People & Body         1
## 11 fire                          🔥      Travel & Places       1
## 12 flexed_biceps                 💪      People & Body         1
## 13 grinning                      😀      Smileys & Emotion     1
## 14 heart_eyes_cat                😻      Smileys & Emotion     1
## 15 high_voltage                  ⚡      Travel & Places       1
## 16 mantelpiece_clock             🕰️       Travel & Places       1
## 17 sleeping                      😴      Smileys & Emotion     1
## 18 slightly_smiling_face         🙂      Smileys & Emotion     1
## 19 sun                           ☀️       Travel & Places       1
## 20 woman                         👩      People & Body         1
## # A tibble: 20 × 4
##    emoji_name                    unicode emoji_category        n
##    <chr>                         <chr>   <chr>             <int>
##  1 arrow_right                   ➡️       Symbols              83
##  2 backhand_index_pointing_right 👉      People & Body        21
##  3 graduation_cap                🎓      Objects              19
##  4 arrow_forward                 ▶️       Symbols              18
##  5 bulb                          💡      Objects              10
##  6 rocket                        🚀      Travel & Places       9
##  7 party_popper                  🎉      Activities            8
##  8 flag_Switzerland              🇨🇭      Flags                 7
##  9 clap                          👏      People & Body         6
## 10 exclamation                   ❗      Symbols               5
## 11 SOON_arrow                    🔜      Symbols               4
## 12 grinning_face_with_big_eyes   😃      Smileys & Emotion     4
## 13 camera_flash                  📸      Objects               3
## 14 computer                      💻      Objects               3
## 15 movie_camera                  🎥      Objects               3
## 16 rainbow                       🌈      Travel & Places       3
## 17 studio_microphone             🎙️       Objects               3
## 18 woman                         👩      People & Body         3
## 19 Christmas_tree                🎄      Activities            2
## 20 backhand_index_pointing_down  👇      People & Body         2
# Generate general tokens for bigram and trigram analysis
tokens <- tweets %>%
  corpus(text_field = "full_text_emojis") %>%
  tokens(
    remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
    remove_url = TRUE, remove_separators = TRUE
  ) %>%
  tokens_tolower() %>%
  tokens_wordstem() %>%
  tokens_select(
    pattern =
      c(
        stopwords("en"), stopwords("de"),
        stopwords("fr"), stopwords("it"), extended_stopwords
      ), selection = "remove"
  )
# Bigram Wordcloud
bi_gram_tokens <- tokens_ngrams(tokens, n = 2)
dfm_bi_gram <- dfm(bi_gram_tokens)
freqs_bi_gram <- sort(colSums(dfm_bi_gram), decreasing = TRUE)
head(freqs_bi_gram, 20)
##          right_arrow             htw_chur          index_point 
##                  421                  259                  207 
##       backhand_index     hochschul_luzern          point_right 
##                  206                  185                  183 
## berner_fachhochschul        sozial_arbeit              prof_dr 
##                  157                  154                  142 
##            haut_cole      herzlich_gratul            open_book 
##                  141                  139                  117 
##        magnifi_glass           glass_tilt           tilt_right 
##                   97                   97                   97 
##         fh_graubnden  neusten_blogbeitrag   book_#revuehmisphr 
##                   91                   87                   85 
##         social_media         advanc_studi 
##                   84                   83
# Create the bigram word cloud
set.seed(123)
wordcloud2(data.frame(
  word = names(freqs_bi_gram),
  freq = freqs_bi_gram
), size = 0.5)
# Trigram Wordcloud
tri_gram_tokens <- tokens_ngrams(tokens, n = 3)
dfm_tri_gram <- dfm(tri_gram_tokens)
reqs_tri_gram <- sort(colSums(dfm_tri_gram), decreasing = TRUE)
head(reqs_tri_gram, 20)
##         backhand_index_point            index_point_right 
##                          206                          183 
##           magnifi_glass_tilt             glass_tilt_right 
##                           97                           97 
##      open_book_#revuehmisphr   hochschul_gestaltung_kunst 
##                           85                           62 
## dipartimento_tecnologi_innov          master_advanc_studi 
##                           40                           38 
##         depart_sozial_arbeit       #infoanlass_mrz_findet 
##                           36                           33 
##              polic_car_light         univers_appli_scienc 
##                           32                           31 
##         busi_administr_statt     findet_#zrich_infoanlass 
##                           30                           30 
##               tag_offenen_tr        hochschul_life_scienc 
##                           29                           29 
##        gestaltung_kunst_fhnw           mas_busi_administr 
##                           29                           28 
##       mehr_neuen_blogbeitrag     mehr_neusten_blogbeitrag 
##                           28                           28
# Create the bigram word cloud
set.seed(123)
wordcloud2(data.frame(
  word = names(reqs_tri_gram),
  freq = reqs_tri_gram
), size = 0.5)

LDA Topic Modeling

# Source: Christoph Zangger -> löscht alle Reihen mit nur 0s
new_dfm <- dfm_subset(dfm_list$en, ntoken(dfm_list$en) > 0)
tweet_lda <- LDA(new_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
topic_terms <- tidy(tweet_lda, matrix = "beta")
# Extract topics and top terms
topics <- as.data.frame(terms(tweet_lda, 50)) # First fifty words per topic

# Extract top terms per topic
top_terms <- topic_terms %>%
  group_by(topic) %>%
  top_n(8, beta) %>% # Show top 8 terms per topic
  ungroup() %>%
  arrange(topic, -beta)

# Visualize top terms per topic
top_terms %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  ggplot(aes(beta, term, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~topic, scales = "free") +
  scale_y_reordered() +
  labs(
    x = "Beta (Term Importance within Topic)",
    y = NULL,
    title = "Top Terms per Topic in Tweets (LDA)"
  )

# Most different words among topics (using log ratios)
diff <- topic_terms %>%
  mutate(topic = paste0("topic", topic)) %>%
  spread(topic, beta) %>%
  filter(topic1 > .001 | topic2 > .001 | topic3 > .001) %>%
  mutate(
    logratio_t1t2 = log2(topic2 / topic1),
    logratio_t1t3 = log2(topic3 / topic1),
    logratio_t2t3 = log2(topic3 / topic2)
  )
diff
## # A tibble: 328 × 9
##    term       topic1  topic2  topic3  topic4  topic5 logratio_t1t2 logratio_t1t3
##    <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>         <dbl>         <dbl>
##  1 @academi… 1.96e-3 5.43e-4 1.53e-3 3.59e-3 3.22e-3        -1.86         -0.358
##  2 @bfh_hesb 1.60e-3 3.51e-3 2.70e-3 4.10e-3 3.54e-3         1.13          0.757
##  3 @ch_univ… 3.20e-4 1.28e-3 6.28e-4 1.06e-4 1.67e-4         2.01          0.975
##  4 @fh_grau… 1.47e-3 4.17e-4 1.24e-4 8.18e-4 1.76e-3        -1.82         -3.58 
##  5 @fhnw     2.59e-3 7.56e-3 7.94e-4 1.15e-3 7.12e-3         1.55         -1.70 
##  6 @fhnwbusi 5.02e-3 3.18e-3 2.27e-3 5.57e-3 6.68e-4        -0.659        -1.15 
##  7 @globalc… 1.89e-3 4.31e-4 2.40e-4 2.90e-4 7.17e-5        -2.13         -2.97 
##  8 @greater… 1.96e-3 1.90e-4 2.21e-3 8.21e-4 2.32e-4        -3.37          0.172
##  9 @grstift… 9.60e-4 2.17e-3 1.41e-3 1.92e-3 2.31e-3         1.17          0.559
## 10 @hes_so   4.52e-4 1.27e-3 3.09e-3 9.90e-4 2.13e-3         1.49          2.77 
## # ℹ 318 more rows
## # ℹ 1 more variable: logratio_t2t3 <dbl>
# LDA Topic Modeling for each university
universities <- unique(tweets$university)

for (uni in universities) {
  # Filter tweets for the current university
  uni_tweets <- tweets %>% filter(university == uni)

  tokens_uni <- uni_tweets %>%
    corpus(text_field = "full_text_emojis") %>%
    tokens(
      remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
      remove_url = TRUE, remove_separators = TRUE
    ) %>%
    tokens_tolower() %>%
    tokens_wordstem() %>%
    tokens_ngrams(n = 1) %>%
    tokens_select(
      pattern =
        c(
          stopwords("en"), stopwords("de"),
          stopwords("fr"), stopwords("it"), extended_stopwords
        ), selection = "remove"
    )
  uni_dfm <- dfm(tokens_uni)
  # Apply LDA
  uni_dfm <- dfm_subset(uni_dfm, ntoken(uni_dfm) > 0)
  tweet_lda <- LDA(uni_dfm, k = 5, control = list(seed = 123))
  # Tidy the LDA results
  tweet_lda_td <- tidy(tweet_lda)
  # Extract top terms per topic
  top_terms <- tweet_lda_td %>%
    group_by(topic) %>%
    top_n(8, beta) %>%
    ungroup() %>%
    arrange(topic, -beta)
  # Visualize top terms per topic
  p <- top_terms %>%
    mutate(term = reorder_within(term, beta, topic)) %>%
    ggplot(aes(beta, term, fill = factor(topic))) +
    geom_col(show.legend = FALSE) +
    facet_wrap(~topic, scales = "free") +
    scale_y_reordered() +
    labs(
      x = "Beta (Term Importance within Topic)",
      y = NULL,
      title = paste("Top Terms per Topic in Tweets from", uni, "(LDA)")
    )
  print(p)
  # Topic Model Summary: top 10 terms per topic
  cat("\nTopic Model Summary for", uni, ":\n")
  print(as.data.frame(terms(tweet_lda, 10)))
}

## 
## Topic Model Summary for FHNW :
##           Topic 1        Topic 2    Topic 3         Topic 4      Topic 5
## 1        @hsafhnw      @fhnwbusi       fhnw            fhnw    @fhnwbusi
## 2            fhnw       @hsafhnw      @fhnw       @fhnwbusi @fhnwtechnik
## 3           swiss           mehr     campus       hochschul        @fhnw
## 4        challeng           fhnw       mehr    @fhnwtechnik     @hsafhnw
## 5          morgen      hochschul       heut @fhnwpsychologi         mehr
## 6            neue   @fhnwtechnik studierend         projekt      schweiz
## 7  brugg-windisch     studierend  hochschul            neue        neuen
## 8            mehr           prof         ab           index        kunst
## 9           olten             dr   @hsafhnw           basel       campus
## 10         erklrt brugg-windisch       neue           @fhnw         heut

## 
## Topic Model Summary for FH_Graubuenden :
##        Topic 1      Topic 2    Topic 3  Topic 4         Topic 5
## 1         chur  blogbeitrag        htw    statt           statt
## 2  #infoanlass         mehr   #htwchur #htwchur            chur
## 3     #htwchur        neuen       chur   findet          findet
## 4          htw   infoanlass       busi    onlin        #htwchur
## 5  blogbeitrag       findet    neusten      htw           #fhgr
## 6    graubnden        #fhgr   #studium    manag            mehr
## 7        #fhgr    graubnden  graubnden     heut       graubnden
## 8   #graubnden         chur      manag     mehr @suedostschweiz
## 9        thema @htwchurtour infoanlass  product             htw
## 10       #chur           fh    studium    #chur     blogbeitrag

## 
## Topic Model Summary for ZHAW :
##             Topic 1          Topic 2          Topic 3         Topic 4
## 1              zhaw            @zhaw            #zhaw @iam_winterthur
## 2              dank @engineeringzhaw       winterthur         schweiz
## 3  @engineeringzhaw             neue             heut           knnen
## 4             @zhaw        @sml_zhaw             neue            mehr
## 5        winterthur       studierend @engineeringzhaw           #zhaw
## 6             thema        schweizer               cc            neue
## 7         schweizer      #zhawimpact             gibt              cc
## 8             zeigt      @c_caviglia      #zhawimpact           neuen
## 9              heut              via              via           studi
## 10              via               cc  @iam_winterthur       schweizer
##             Topic 5
## 1              zhaw
## 2  @engineeringzhaw
## 3                cc
## 4              mehr
## 5             zeigt
## 6              heut
## 7         @sml_zhaw
## 8             knnen
## 9             studi
## 10             gibt

## 
## Topic Model Summary for bfh :
##          Topic 1       Topic 2    Topic 3   Topic 4         Topic 5
## 1            bfh           bfh       bern       bfh            mehr
## 2           biel         thema       neue      neue             bfh
## 3           bern        berner     berner @bfh_hesb #knoten_maschen
## 4           mehr     @bfh_hesb       mehr      bern            bern
## 5         arbeit fachhochschul      innen     thema          berner
## 6        projekt      @hkb_bfh      neuen   zukunft       schweizer
## 7       @hkb_bfh            ab      zeigt   projekt       @bfh_hesb
## 8  fachhochschul      erfahren      thema   digital           knnen
## 9          zeigt          biel      studi      geht        anmelden
## 10         index         statt nachhaltig       neu            neue

## 
## Topic Model Summary for hes_so :
##          Topic 1  Topic 2  Topic 3     Topic 4     Topic 5
## 1          arrow   hes-so   hes-so     @hes_so       right
## 2            dan    right     haut      projet      projet
## 3          right    arrow     cole      master       arrow
## 4         projet  @hes_so  tudiant     tudiant @hessovalai
## 5           book      dan      dan        open        haut
## 6          suiss     tilt    arrow @hessovalai  professeur
## 7  #revuehmisphr  #hes_so  #hes_so       right      domain
## 8        nouvell recherch     plus      hes-so      master
## 9    @hessovalai  travail programm       diplm        open
## 10      dcouvrez    glass recherch    recherch     nouveau

## 
## Topic Model Summary for hslu :
##            Topic 1         Topic 2            Topic 3         Topic 4
## 1            @hslu       hochschul    #hsluinformatik          luzern
## 2             mehr          luzern              studi            mehr
## 3           luzern           @hslu          interview           @hslu
## 4             neue            mehr              zeigt            neue
## 5  #hsluinformatik         schweiz             depart          finden
## 6           depart            heut           menschen      #hslumusik
## 7           design #hsluinformatik    #hsluwirtschaft            heut
## 8         bachelor         projekt             design         studium
## 9             jahr            gibt #hslusozialearbeit #hsluwirtschaft
## 10       schweizer       schweizer              neuen       digitalen
##       Topic 5
## 1       @hslu
## 2       zeigt
## 3       welch
## 4      luzern
## 5        heut
## 6       knnen
## 7      depart
## 8   schweizer
## 9  studierend
## 10      digit

## 
## Topic Model Summary for ost_fh :
##                       Topic 1                    Topic 2
## 1  #ostschweizerfachhochschul                   @ozg_ost
## 2                     @ost_fh #ostschweizerfachhochschul
## 3                 #informatik                    @ost_fh
## 4                         ost                 ostschweiz
## 5                   st.gallen                        ost
## 6                         neu                       neue
## 7                  @eastdigit              fachhochschul
## 8      #wirtschaftsinformatik                       drei
## 9                    @itrockt                 #countdown
## 10                    podcast                      thema
##                       Topic 3                    Topic 4
## 1                         ost                        ost
## 2  #ostschweizerfachhochschul #ostschweizerfachhochschul
## 3                     @ost_fh                       mehr
## 4                  rapperswil                      leben
## 5                    bachelor                 rapperswil
## 6                      campus               kulturzyklus
## 7                         fhs                    podcast
## 8                   st.gallen                      onlin
## 9                      alumni                   kontrast
## 10                      statt                    @ost_fh
##                       Topic 5
## 1                     @ost_fh
## 2  #ostschweizerfachhochschul
## 3                        mehr
## 4                     @ost_wi
## 5                         ost
## 6                     projekt
## 7                   schweizer
## 8                       neuen
## 9                       thema
## 10                   @ozg_ost

## 
## Topic Model Summary for supsi_ch :
##      Topic 1      Topic 2      Topic 3      Topic 4      Topic 5
## 1      arrow        supsi        supsi     #supsiev         info
## 2      right    #supsinew    formazion        supsi           pi
## 3      supsi     studenti     progetto     bachelor     #supsiev
## 4   #supsiev         oggi        studi    formazion        manag
## 5   progetto       master    #supsinew       master   iscrizioni
## 6  #supsinew          deg        nuovo    tecnologi    formazion
## 7   studenti   iscrizioni @usi_univers        corsi        nuovo
## 8  @supsi_ch informazioni    @supsi_ch       scopri dipartimento
## 9     stream         busi       campus @usi_univers      novembr
## 10  svizzera   ingegneria           pi    @supsi_ch         tema

Style Analysis

tweets %>%
  mutate(tweet_length = nchar(full_text)) %>%
  ggplot(aes(x = tweet_length)) +
  geom_histogram() +
  labs(title = "Distribution of Tweet Lengths")

### Sentiment Analysis

# Calculate Sentiment for Supported Languages Only
langs <- c("de", "fr", "it", "en")

tweets_filtered <- tweets %>%
  filter(lang %in% langs)

# TODO: because sentiment only works for english
# Create Function to Get Syuzhet Sentiment
get_syuzhet_sentiment <- function(text, lang) {
  # Check if language is supported
  if (lang %in% langs) {
    return(get_sentiment(text, method = "syuzhet", lang = lang))
  } else {
    return(NA) # Return NA for unsupported languages
  }
}

# Calculate Syuzhet Sentiment for each Tweet
tweets_filtered$sentiment <-
  mapply(get_syuzhet_sentiment, tweets_filtered$full_text, tweets_filtered$lang)

plot_data <- tweets_filtered %>%
  group_by(university, month) %>%
  summarize(mean_sentiment_syuzhet = mean(sentiment, na.rm = TRUE))

# Plot Syuzhet Sentiment by all Universities
ggplot(plot_data, aes(
  x = month,
  y = mean_sentiment_syuzhet,
  color = university, group = university
)) +
  geom_line() +
  labs(
    title = "Mean Syuzhet Sentiment Over Time by University",
    y = "Mean Sentiment Score"
  ) +
  scale_x_datetime(date_breaks = "1 month", date_labels = "%Y-%m") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

for (uni in unique(tweets$university)) {
  uni_tweets <- tweets %>%
    filter(university == uni, lang == "en")

  uni_tweets$sentiment <-
    mapply(get_syuzhet_sentiment, uni_tweets$full_text, uni_tweets$lang)

  plot_data <- uni_tweets %>%
    group_by(month) %>%
    summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))

  # Plot Syuzhet Sentiment Over Time (Per University)
  print(ggplot(plot_data, aes(x = month, y = mean_sentiment, group = 1)) +
    geom_line() +
    geom_smooth(method = "lm", se = FALSE, color = "red") +
    labs(
      title = paste0("Mean Syuzhet Sentiment Over Time by - ", uni),
      y = "Mean Sentiment Score",
      x = "Month"
    ))

  # Did not found a way to get the sentiment from the tweets for each language so I will use the full_text_emojis column and detect the language of the words only in german
  # Tokenize and Preprocess Words
  uni_words_en <- uni_tweets %>%
    unnest_tokens(word, full_text_emojis) %>%
    anti_join(get_stopwords(language = "en"), by = "word") %>%
    distinct() %>%
    filter(nchar(word) > 3) %>%
    filter(!str_detect(word, "\\d")) %>%
    filter(!str_detect(word, "https?://\\S+|www\\.\\S+|t\\.co|http|https"))


  sentiment_words_en <- uni_words_en %>%
    mutate(
      sentiment = get_sentiment(word, method = "syuzhet")
    )

  # Separate Positive and Negative Words
  positive_words_en <- sentiment_words_en %>%
    filter(sentiment >= 0) %>%
    count(word, sort = TRUE) %>%
    rename(freq = n)

  negative_words_en <- sentiment_words_en %>%
    filter(sentiment < 0) %>%
    count(word, sort = TRUE) %>%
    rename(freq = n)

  # Create and Display Word Clouds
  # positive words
  print(paste0("Positive words for: ", uni))
  print(head(positive_words_en, 20))
  print(wordcloud2(data.frame(
    word = positive_words_en$word,
    freq = positive_words_en$freq
  ), size = 0.5))

  print(paste0("Negative words for: ", uni))
  print(head(negative_words_en, 20))
  # negative words
  print(wordcloud2(data.frame(
    word = negative_words_en$word,
    freq = negative_words_en$freq
  ), size = 0.5))
}

## [1] "Positive words for: FHNW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word          freq
##    <chr>      <chr>        <int>
##  1 FHNW       aacsb            1
##  2 FHNW       academy          1
##  3 FHNW       acbtxijqdl       1
##  4 FHNW       accelerate       1
##  5 FHNW       accross          1
##  6 FHNW       activities       1
##  7 FHNW       adorno           1
##  8 FHNW       agreement        1
##  9 FHNW       aline            1
## 10 FHNW       although         1
## 11 FHNW       ambassador       1
## 12 FHNW       america          1
## 13 FHNW       american         1
## 14 FHNW       among            1
## 15 FHNW       amxxbmlfyc       1
## 16 FHNW       anlass           1
## 17 FHNW       anmelden         1
## 18 FHNW       announce         1
## 19 FHNW       announcement     1
## 20 FHNW       announces        1
## [1] "Negative words for: FHNW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 FHNW       avoid             1
##  2 FHNW       bacteria          1
##  3 FHNW       blatant           1
##  4 FHNW       blind             1
##  5 FHNW       boom              1
##  6 FHNW       breaking          1
##  7 FHNW       challenge         1
##  8 FHNW       cloud             1
##  9 FHNW       competition       1
## 10 FHNW       concerned         1
## 11 FHNW       devastating       1
## 12 FHNW       disadvantaged     1
## 13 FHNW       exhausted         1
## 14 FHNW       forget            1
## 15 FHNW       hype              1
## 16 FHNW       late              1
## 17 FHNW       launch            1
## 18 FHNW       limited           1
## 19 FHNW       mistakes          1
## 20 FHNW       outreach          1

## [1] "Positive words for: FH_Graubuenden"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university     word            freq
##    <chr>          <chr>          <int>
##  1 FH_Graubuenden able               1
##  2 FH_Graubuenden abroad             1
##  3 FH_Graubuenden abstract           1
##  4 FH_Graubuenden abstracts          1
##  5 FH_Graubuenden academic           1
##  6 FH_Graubuenden accu_rate          1
##  7 FH_Graubuenden across             1
##  8 FH_Graubuenden action             1
##  9 FH_Graubuenden activities         1
## 10 FH_Graubuenden address            1
## 11 FH_Graubuenden administration     1
## 12 FH_Graubuenden advantage          1
## 13 FH_Graubuenden advantages         1
## 14 FH_Graubuenden adventurous        1
## 15 FH_Graubuenden alliance           1
## 16 FH_Graubuenden almost             1
## 17 FH_Graubuenden alps               1
## 18 FH_Graubuenden already            1
## 19 FH_Graubuenden also               1
## 20 FH_Graubuenden alumnus            1
## [1] "Negative words for: FH_Graubuenden"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university     word         freq
##    <chr>          <chr>       <int>
##  1 FH_Graubuenden collective      1
##  2 FH_Graubuenden competition     1
##  3 FH_Graubuenden corruption      1
##  4 FH_Graubuenden countdown       1
##  5 FH_Graubuenden fall            1
##  6 FH_Graubuenden fallen          1
##  7 FH_Graubuenden fighting        1
##  8 FH_Graubuenden fire            1
##  9 FH_Graubuenden kick            1
## 10 FH_Graubuenden leave           1
## 11 FH_Graubuenden neglect         1
## 12 FH_Graubuenden neglecting      1
## 13 FH_Graubuenden problem         1
## 14 FH_Graubuenden quiz            1
## 15 FH_Graubuenden rainy           1
## 16 FH_Graubuenden risks           1
## 17 FH_Graubuenden spent           1
## 18 FH_Graubuenden strange         1
## 19 FH_Graubuenden stupidest       1
## 20 FH_Graubuenden sues            1

## [1] "Positive words for: ZHAW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word          freq
##    <chr>      <chr>        <int>
##  1 ZHAW       _bengraziano     1
##  2 ZHAW       abilities        1
##  3 ZHAW       able             1
##  4 ZHAW       abroad           1
##  5 ZHAW       abstracts        1
##  6 ZHAW       academ           1
##  7 ZHAW       academic         1
##  8 ZHAW       acertainpain     1
##  9 ZHAW       achievement      1
## 10 ZHAW       across           1
## 11 ZHAW       actually         1
## 12 ZHAW       addition         1
## 13 ZHAW       additional       1
## 14 ZHAW       adespydvzf       1
## 15 ZHAW       admits           1
## 16 ZHAW       adopted          1
## 17 ZHAW       advise           1
## 18 ZHAW       advisory         1
## 19 ZHAW       afterwards       1
## 20 ZHAW       agarwaledu       1
## [1] "Negative words for: ZHAW"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 ZHAW       barrier         1
##  2 ZHAW       bastion         1
##  3 ZHAW       break           1
##  4 ZHAW       challenge       1
##  5 ZHAW       cold            1
##  6 ZHAW       competition     1
##  7 ZHAW       conflict        1
##  8 ZHAW       countdown       1
##  9 ZHAW       desert          1
## 10 ZHAW       economic        1
## 11 ZHAW       enough          1
## 12 ZHAW       entitled        1
## 13 ZHAW       fled            1
## 14 ZHAW       foreign         1
## 15 ZHAW       hack            1
## 16 ZHAW       hazard          1
## 17 ZHAW       hidden          1
## 18 ZHAW       ironic          1
## 19 ZHAW       missing         1
## 20 ZHAW       moan            1

## [1] "Positive words for: bfh"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 bfh        _mich_i            1
##  2 bfh        abend              1
##  3 bfh        able               1
##  4 bfh        academic           1
##  5 bfh        accepted           1
##  6 bfh        access             1
##  7 bfh        across             1
##  8 bfh        activities         1
##  9 bfh        addressing         1
## 10 bfh        administration     1
## 11 bfh        agriculture        1
## 12 bfh        alphasolarpro      1
## 13 bfh        alternative        1
## 14 bfh        always             1
## 15 bfh        amarenabrown       1
## 16 bfh        america's          1
## 17 bfh        analysis           1
## 18 bfh        andreasnaef        1
## 19 bfh        annetwh            1
## 20 bfh        announce           1
## [1] "Negative words for: bfh"
## # A tibble: 12 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 bfh        aggression         1
##  2 bfh        broken             1
##  3 bfh        competition        1
##  4 bfh        discrimination     1
##  5 bfh        fail               1
##  6 bfh        forget             1
##  7 bfh        inequality         1
##  8 bfh        labor              1
##  9 bfh        player             1
## 10 bfh        sorry              1
## 11 bfh        stole              1
## 12 bfh        stop               1

## [1] "Positive words for: hes_so"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word            freq
##    <chr>      <chr>          <int>
##  1 hes_so     academic           1
##  2 hes_so     active             1
##  3 hes_so     actors             1
##  4 hes_so     actualitesvd       1
##  5 hes_so     administration     1
##  6 hes_so     administrative     1
##  7 hes_so     admission          1
##  8 hes_so     advanced           1
##  9 hes_so     agencies           1
## 10 hes_so     agenda             1
## 11 hes_so     also               1
## 12 hes_so     amazing            1
## 13 hes_so     amman              1
## 14 hes_so     analyses           1
## 15 hes_so     anne_ramelet       1
## 16 hes_so     announce           1
## 17 hes_so     anounce            1
## 18 hes_so     antonio            1
## 19 hes_so     applications       1
## 20 hes_so     apply              1
## [1] "Negative words for: hes_so"
## # A tibble: 9 × 3
## # Groups:   university [1]
##   university word         freq
##   <chr>      <chr>       <int>
## 1 hes_so     confessions     1
## 2 hes_so     converted       1
## 3 hes_so     fade            1
## 4 hes_so     failure         1
## 5 hes_so     hard            1
## 6 hes_so     intense         1
## 7 hes_so     launch          1
## 8 hes_so     poor            1
## 9 hes_so     vice            1

## [1] "Positive words for: hslu"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 hslu       aacsb             1
##  2 hslu       able              1
##  3 hslu       abstract          1
##  4 hslu       academia          1
##  5 hslu       academic          1
##  6 hslu       accept            1
##  7 hslu       acceptance        1
##  8 hslu       access            1
##  9 hslu       according         1
## 10 hslu       account           1
## 11 hslu       accreditation     1
## 12 hslu       achieving         1
## 13 hslu       action            1
## 14 hslu       additions         1
## 15 hslu       address           1
## 16 hslu       advantage         1
## 17 hslu       afternoon         1
## 18 hslu       agflow            1
## 19 hslu       ahead             1
## 20 hslu       aims              1
## [1] "Negative words for: hslu"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 hslu       addiction       1
##  2 hslu       awaited         1
##  3 hslu       bacteria        1
##  4 hslu       challenge       1
##  5 hslu       competition     1
##  6 hslu       crashes         1
##  7 hslu       cut_up_tv       1
##  8 hslu       dark            1
##  9 hslu       dizzy           1
## 10 hslu       error           1
## 11 hslu       fail            1
## 12 hslu       fall            1
## 13 hslu       fears           1
## 14 hslu       fire            1
## 15 hslu       hack            1
## 16 hslu       laden           1
## 17 hslu       launch          1
## 18 hslu       missed          1
## 19 hslu       mistake         1
## 20 hslu       regulatory      1

## [1] "Positive words for: ost_fh"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word           freq
##    <chr>      <chr>         <int>
##  1 ost_fh     announce          1
##  2 ost_fh     august            1
##  3 ost_fh     backbone          1
##  4 ost_fh     based             1
##  5 ost_fh     beautiful         1
##  6 ost_fh     bridge            1
##  7 ost_fh     business          1
##  8 ost_fh     campus            1
##  9 ost_fh     closer            1
## 10 ost_fh     collaboration     1
## 11 ost_fh     cooling           1
## 12 ost_fh     curious           1
## 13 ost_fh     cyber             1
## 14 ost_fh     deadline          1
## 15 ost_fh     december          1
## 16 ost_fh     delighted         1
## 17 ost_fh     easily            1
## 18 ost_fh     eastern           1
## 19 ost_fh     electricity       1
## 20 ost_fh     emits             1
## [1] "Negative words for: ost_fh"
## # A tibble: 2 × 3
## # Groups:   university [1]
##   university word   freq
##   <chr>      <chr> <int>
## 1 ost_fh     dire      1
## 2 ost_fh     fire      1

## [1] "Positive words for: supsi_ch"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word             freq
##    <chr>      <chr>           <int>
##  1 supsi_ch   _dreicast           1
##  2 supsi_ch   abbg                1
##  3 supsi_ch   abbgroupnews        1
##  4 supsi_ch   abroad              1
##  5 supsi_ch   academ              1
##  6 supsi_ch   academia            1
##  7 supsi_ch   academic            1
##  8 supsi_ch   academies_ch        1
##  9 supsi_ch   access              1
## 10 supsi_ch   accoding            1
## 11 supsi_ch   achieve             1
## 12 supsi_ch   action              1
## 13 supsi_ch   activator           1
## 14 supsi_ch   activities          1
## 15 supsi_ch   address             1
## 16 supsi_ch   administration      1
## 17 supsi_ch   administrations     1
## 18 supsi_ch   admooajcib          1
## 19 supsi_ch   advanced            1
## 20 supsi_ch   advancedstudies     1
## [1] "Negative words for: supsi_ch"
## # A tibble: 20 × 3
## # Groups:   university [1]
##    university word         freq
##    <chr>      <chr>       <int>
##  1 supsi_ch   barrier         1
##  2 supsi_ch   cloud           1
##  3 supsi_ch   cold            1
##  4 supsi_ch   collision       1
##  5 supsi_ch   critical        1
##  6 supsi_ch   danger          1
##  7 supsi_ch   demand          1
##  8 supsi_ch   demanded        1
##  9 supsi_ch   distracts       1
## 10 supsi_ch   drug            1
## 11 supsi_ch   economic        1
## 12 supsi_ch   eth_rat         1
## 13 supsi_ch   fabrication     1
## 14 supsi_ch   fire            1
## 15 supsi_ch   foreign         1
## 16 supsi_ch   forget          1
## 17 supsi_ch   government      1
## 18 supsi_ch   hard            1
## 19 supsi_ch   intense         1
## 20 supsi_ch   launch          1

Question 4: What specific advice can you give us as communication department of BFH based on your analysis? How can we integrate the analysis of tweets in our internal processes, can you think of any data products that would be of value for us?

Summary key insights from the analysis

# Language Analysis
tweets %>%
  count(lang) %>%
  arrange(desc(n))
## # A tibble: 127 × 3
## # Groups:   university [8]
##    university     lang      n
##    <chr>          <chr> <int>
##  1 hslu           de     2902
##  2 bfh            de     2760
##  3 ZHAW           de     2712
##  4 FHNW           de     2353
##  5 FH_Graubuenden de     2245
##  6 supsi_ch       it     1786
##  7 hes_so         fr     1663
##  8 FH_Graubuenden <NA>    374
##  9 ost_fh         de      248
## 10 ZHAW           <NA>    225
## # ℹ 117 more rows
# Emoji Analysis
emoji_count <- tweets %>%
  top_n_emojis(full_text)

emoji_count %>%
  mutate(emoji_name = reorder(emoji_name, n)) %>%
  ggplot(aes(n, emoji_name)) +
  geom_col() +
  labs(x = "Count", y = NULL, title = "Top 20 Emojis Used")

insights <- list(
  "Most Active Hours" = hours_with_most_tweets_by_uni,
  "Most Active Days" = days_with_most_tweets_by_uni,
  "Content Analysis" = head(words_freqs_de),
  "Sentiment Analysis" = head(tweets_filtered$sentiment)
)

Recommendations:

1. Optimize tweet release times based on peak engagement hours.

2. Focus on specific days with high activity for important announcements.

3. Utilize sentiment analysis to tailor content that resonates positively with the audience.

4. Implement topic modeling to identify key themes and align communication strategy accordingly.